library(plotly)
library(data.table)
library(tidyr)
library(knitr)
library(heatmaply)
All genres:
## [1] "Satire" "SciFi" "Drama" "Action"
## [5] "Romance" "Mystery" "Horror" "Self.help"
## [9] "Health" "Guide" "Travel" "Children.s"
## [13] "Religion" "Science" "History" "Math"
## [17] "Anthology" "Poetry" "Encyclopedias" "Dictionaries"
## [21] "Comics" "Art" "Cookbooks" "Diaries"
## [25] "Journals"
## [1] TRUE
data.tablehead(books_dt)
## genreA genreB customers
## 1: Satire Satire 3798
## 2: SciFi Satire 423
## 3: Drama Satire 19
## 4: Action Satire 343
## 5: Romance Satire 505
## 6: Mystery Satire 227
## [1] 2.332187
Hypothesis
Look for customers that buy only one genre
column sum and 2*diagonal valuegenerate table with {genre, {2*diagonal-colSum}}
Satire and Travel rather bought in pairs
## genreA genreB customers rel_customers
## 1: Action Satire 343 0.0069304130
## 2: Action SciFi 44698 0.9031358603
## 3: Action Drama 23 0.0004647216
## 4: Action Action 49492 1.0000000000
## 5: Action Romance 15685 0.3169199062
## 6: Action Mystery 1599 0.0323082518
–> genreB relative to genreA-diagonal value
Look at all data unsorted: No pattern.
With clustering of rows and columns (Note: they are different now):
–> about 20% customers additionally bought SciFi and Romance
–> Math is poetry and History is Science fiction!